Identical recordings on p2p network mapped onto single query result

ABSTRACT

A P2P network of digital recorders is queried about the presence of particular content that relates to a recorded broadcast program. The list of matching query results may be enormous if the program is a popular one. Therefore, the list is condensed by means of representing multiple identical ones among the results as a single item.

FIELD OF THE INVENTION

The invention relates to an apparatus and to software for sharingrecorded broadcasts via a peer-to-peer (P2P) network.

BACKGROUND ART

The term P2P refers to a type of transient Internet network that allowsa group of users with the same networking program to connect with eachother and directly access files from one another's data storage.Distributed storage of content information on a (peer-to-peer) P2Pnetwork is discussed in, e.g., U.S. Patent Application Publication No.US20020162109 (attorney docket U.S. 018052) filed Apr. 26, 2001, forEugene Shteyn and herein incorporated by reference. This patent documentrelates to an electronic content delivery system on a network ofend-user devices around a hub. Each end-user device, e.g., a settop box(STB) has storage capability. Under control of the content provider,content is stored in a distributed fashion on the network of theseend-user devices for being made available to individual ones of thesedevices in a P2P fashion so as to cut download time and reducetransmission errors.

Various P2P configurations exist, such as a centralized configuration, adecentralized configuration and a controlled centralized configuration.In a centralized configuration, the system depends on a central serverthat directs the communication between peers. “Napster” is an example ofa centralized configuration. A decentralized configuration has not got acentral server, and each peer is capable of acting as a client, as aserver or as both. A user connects to the decentralized network byconnecting to another user who is connected. “Gnutella” and “Kazaa” areexamples of decentralized networks. In a controlled decentralizedconfiguration a user may act as a client, as a server or as both as inthe decentralized configuration, but specific operators control whichuser is allowed to access which particular server. “Morpheus” is anexample of the latter. For a brief discussion of P2P networkarchitectures see, e.g., “Stretching The Fabric Of The Net: Examiningthe present and potential of peer-to-peer technologies”, Software &Information Industry Association (SIIA), 2001.

“Kazaa”, mentioned above, enables the sharing of files. “Kazaa MediaDesktop” (KMD) software installed at an end-user enables to connect toother KMD users. The software provides a search functionality to searchfor particular content shared by other KMD users. The searches are runvia specific KMD users, referred to as Supernodes, who have fastconnections and powerful computers. A Supernode indexes the contentavailable at users connected to it. Upon locating the desired file, KMDenables to directly download the file from the user who has it. In orderto enable to identify content within KaZaa, each file is provided with ameta-tag that represents the fingerprint of the file content. Files withidentical content have an identical Message Digest value calculatedusing cryptographic secure MD5 hashing of the content, see, e.g., “KaZaAP2P FastTrack File Formats” at <http://kzfti.cjb.net> or at<http://home.hetnet.nl/˜frejon55/ft/KazaaFileFormats.html>.

“Morpheus”, mentioned above, uses metadata with XML format descriptorsthat specify the content of the relevant file. Accordingly, files can besearched by attributes such as title, artist, category, etc. Descriptorsare derived automatically from the file's metadata, or are provided bythe user via the application's file import wizard.

SUMMARY OF THE INVENTION

The inventors have realized that using a content hash as identifier hasdrawbacks when the content relates to a recording of, e.g., a broadcast,that is made available to other users on a P2P network. For example,different recorders may have recorded the same broadcast program, butone recorder started recording a few seconds earlier than the other and,e.g., recorded the announcement as well that preceded the programitself. In another example, to fit a program within the available timeslot at a first broadcast station, not all frames are broadcast (withoutthe viewer noticing this), whereas a second station broadcasts the sameprogram with all frames. In both examples, the semantically identicalprograms get different hash values and therefore get differentidentities. As a result, an inventory of recorded content based on hashvalues is not practical, as a search returns multiple hits that arebasically identical programs. If the content comprises a recordedbroadcast program that was highly popular, the number of hits returnedcan be very high, which clutters the graphical user-interface (GUI)rendered on a display monitor and confuses the end-user. Similarly,searching files based on user-provided descriptors is not ideal either.In addition, the descriptors for the same content may not be identicalas a result of language, typographical errors or mere subjectivity.

The inventors have therefore realized that, especially with regard torecorded broadcast content shared on a P2P network, the user interfaceis to be made more user-friendly and more ergonomic.

To this end, the inventors propose to cluster the returned hits so as torepresent to the user multiple identical ones among a plurality of hitsas a single item. More specifically, an embodiment of the inventionrelates to a consumer electronics (CE) apparatus that has a networkconnection for a P2P network of recorders. The apparatus has anoperational mode for querying the network about specific contentrecorded from a broadcast. The apparatus presents multiple identicalones among a plurality of query results as a single item. The queryitself is accomplished using any appropriate method, includingconventional ones as used on the known P2P networks. The query analyzesthe metadata of the recorded content available at the peers and returnsthe results. The metadata comprises data descriptive of the content,e.g., a title,. the cast in case of a movie or play, etc. The inputentered to start the query is used to find matching information in themetadata. The metadata of a content file further comprises an identifierof the content. Discriminating between different pieces of contentmatching the query criterion is based on each different one of theplurality of query results being characterized by a respectiveidentifier. The unique identifier is comprised in the metadata recordedwith the content as available on the P2P network. If there are multiplehits among the query results that have the same content identifier, theapparatus lists these multiple hits as a single item.

Preferably, the CE apparatus comprises a digital recorder for recordingbroadcast content, and has a further operational mode for downloadingthe specific content found through querying the peers on the P2Pnetwork, at least partly from one of the peers. Other parts of thespecific content may be downloaded from other peers, e.g., in order tobalance network load or recorder load.

The identifier, used to cluster identical query results, comprises,e.g., a V-ISAN (Versioned-International Standard Audiovisual Number).The V-ISAN format builds on ISO's original concept of the ISAN(International Standard Audiovisual Number). The V-ISAN is to uniquelyidentify audio-visual works. The V-ISAN allows comparisons betweenV-ISANs to determine whether two pieces of content differ only by beinga different version of the same root work or are different episodes ofthe same series. Another example of a content identifier is the CRID(Content Reference ID) used in the TV-Anytime concept. As explainedfurther below, the CRID is an identifier assigned by an authority to aspecific piece of content. CRIDs comply with a hierarchical format thatenables to represent relationships between pieces of content as isexplained further below. For more information on TV-Anytime and CRIDssee, e.g., Document SP002v1.2 “Specification Series: S-2 on: SystemDescription (Informative with mandatory Appendix B)”, Apr. 5, 2002; andU.S. Patent Application Publication No. US 20020038352 (attorney docketGB 000132) HANDLING BROADCAST DATA TOKENS filed for Alexis Ashley.

Another embodiment of the invention relates to software for beinginstalled on a networked-enabled CE apparatus for enabling to query aP2P network of digital recorders. The software renders the apparatusoperational for querying the network about specific content recordedfrom a broadcast and for presenting multiple identical ones among aplurality of query results as a single item in an appropriate userinterface, e.g., on a display monitor.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is explained in further detail, by way of example and withreference to the accompanying drawing wherein:

FIG. 1 is a diagram illustrating process steps the invention; and

FIG. 2 is a block diagram of a system in the invention.

Throughout the figures, same reference numerals indicate similar orcorresponding features.

DETAILED EMBODIMENTS

In a P2P network of DVRs, the users can search for content and sharerecorded content with each other via this network. Peers (users) cancreate a community and publish content within that group for the purposeof sharing. Broadcasters, or other third parties, e.g., contentproviders, can create communities as well. When searching for aparticular piece, or type, of content, many of the search results may beidentical, e.g., as a consequence of the same content having beenrecorded from the same broadcast at multiple users. A user conducting asearch is primarily interested in semantically different results, i.e.,in different pieces of content that match the same search criteria)instead of in a list containing many, e.g., thousands, of entries of thesame pieces of content. The invention seeks to solve this problem asillustrated in FIG. 1.

FIG. 1 is a diagram that illustrates the steps in a process 100according to the invention. In step 102 the user enters, through somesuitable interface, keywords for querying content on P2P network. Instep 104 the metadata of the content available from peers on the P2Pnetwork get matched against the keywords entered. The interface throughwhich the user is to specify his/her query criterion is preferablypreformatted so as to take the format and segmentation of the metadatainto account. For example, the metadata comprises a field “title of thepiece of content”. The user interface then preferably has an entry“title” wherein the user can specify keywords that he/she expects tooccur in the title of the piece of content sought for. In step 106,information about the matching query results gets returned to the user.This information comprises content identifier and network address foreach match. In step 108, the query results that have got the sameidentical identifier get clustered. In step 110, the user is presented alist of the query results in such a manner that the clustered resultsare represented as a single item.

An example of an identifier that can be used for clustering identicalquery results is the TV-Anytime CRID, as mentioned above. The TV-Anytimeforum aims to specify a set of industry-wide standards for Digital VideoRecorders (DVRs), also referred to as Personal Video Recorders (PVRs). APVR is a video recorder with a hard disk for video storage. Phase One ofTV-Anytime enables audio and video search, capture and playback ofcontent. It also enables segmentation and indexing of that content.Phase Two will specify open standards that build on the foundations ofPhase One specifications and will include areas such as targeting,redistribution and new content types. Content redistribution includesmoving content around among devices and systems. Examples ofredistribution are, e.g., content sharing, home networking and removablemedia. Content sharing is the P2P distribution of content over providernetworks. Home networking relates to the sharing of content amongmultiple storage devices and display terminals within a defmed privatephysical network. Removable media are involved in the redistribution ofcontent on physical storage such as optical discs, flash cards, etc.

One feature of the TV-Anytime specifications is content referencing.This specification provides the ability to map a unique identifier of apiece of content such as a TV program on a time and/or location (e.g.,TV channel) where this piece of content can be acquired. The identifieris called a CRID (“content reference ID”). In the terminology ofTV-Anytime, an organization that creates CRIDs is called an “authority”.There can be any number of authorities producing CRIDs, but eachauthority is uniquely identified by a name. The TV-Anytime standard usesthe DNS name registration system to ensure that these names are unique.Each CRID has the name of the authority that issued it embedded in theCRID, and there is accordingly a requirement for a means to take anauthority name from a CRID, and find the server on the Internet wherethe CRID can be converted to a location.

In an embodiment of the invention the TV-Anytime CRIDs are being used toeliminate duplicates. Content that originates from the same contentcreator (authority) will have the same CRID. The user will be presentedonly the different results from the responses to his/her query. Theresults that are identical are grouped together and presented to theuser as a single result in a GUI. This way, the user only sees thesemantically different results to his/her search request. If a userrecords a piece of content, this CRID will be attached to it, so allrecorders that record that piece will have the same CRID attached to it.Now, if the user is interested in one of the results of his/her query,the recorder can either choose one from among the identical results, orpresent the user with a list of sources from which the content isavailable. The latter can give the user the option to decide between thesources based on, for example, how much it costs to download the content(in a pay per view model), if this is applicable. Alternatively, theuser's system determines automatically from which resource or resourcesto download the content in order to, e.g., optimize bandwidth usage,network load, data traffic, etc.

FIG. 2 is a block diagram of a P2P system 200 in the invention. System200 comprises a CE apparatus 202, a data network 204, and a plurality ofdata storage devices 206, 208, . . . , and 210. Network 204 connectsapparatus 202 to each of storage devices 206-210. In this example, eachof devices 206-210 comprises a respective DVR for recording content thatis being broadcast or otherwise made available to the user of therespective DVR. CE apparatus 202 has a first operational mode wherein itis enabled to query program inventories 212, 214, . . . , and 216 ofdevices 206-210, respectively. Inventories 212-216 are automaticallyestablished based on, e.g., the metadata recorded with the programs, orbased on the EPG, used to program recorders 206-210. Inventories 212-216include content identifiers, here the CRIDs, and further descriptiveinformation such as the titles.

Assume that the user queries P2P network 200 about content that has acertain keyword in its title as represented in its metadata. Assume nowthat the matching query results refer to “title A” in inventories 212,214 and 216, and to title H in inventory 216. The user would bepresented with four hits in a conventional approach. In the invention,CE apparatus 202 also takes the CRIDs into account in order to presentnormalized results to the user. Three hits all have the same identifier“CRID1”. The user of apparatus 202 now sees in a GUI 218 of apparatus202 only two results: “title A” and “title H”. If the user wishes todownload the content associated with title A, he/she clicks on “title A”in GUI 218. Apparatus 202 now can proceed to select any method ofdownloading the associated content. For example, apparatus chooses todownload from device 206 because it is fewer network hops away thanapparatus 208 and 210. All this is transparent to the user of apparatus202.

In an embodiment of the invention, the functionality of apparatus 202relating to the querying and to the condensed representation of thequery results is implemented by means of software 220 installed on,e.g., a PC, an STB, or an interactive TV, etc. For example, thissoftware 220 comes on top of conventional P2P equipment used for sharingfiles. As noted above, if the files relate to recorded broadcasts ofpopular programs, the presentation of query results may lead to hugelists. The software in the invention enables to condense the list ofquery results to a manageable length by means of mapping identicalresults relating to different locations (peers) onto a single entry inthe list.

1. A CE apparatus having a network connection for a P2P network, andhaving an operational mode for querying the network about specificcontent recorded from a broadcast and for presenting multiple identicalones among a plurality of query results as a single item.
 2. The CEapparatus of claim 1, wherein each different one of the plurality ofquery results is characterized by a respective identifier comprised inrecorded metadata.
 3. The CE apparatus of claim 2, wherein therespective unique identifier comprises a respective CRID.
 4. The CEapparatus of claim 1, comprising a digital recorder for recordingbroadcast content.
 5. The CE apparatus of claim 1, having a furtheroperational mode for downloading the specific content from the P2Pnetwork.
 6. Software for being installed on a networked-enabled CEapparatus for enabling to participate in a P2P network, the softwarerendering the apparatus operational for querying the network aboutspecific content recorded from a broadcast and for presenting multipleidentical ones among a plurality of query results as a single item. 7.The software of claim 6, operative to differentiate among the queryresults based on content identifiers in metadata.
 8. The software ofclaim 7, wherein the content identifiers are based on CRIDs.
 9. The CEapparatus of claim 1, wherein for the single item the multiple identicalones among the plurality of query results are counted.
 10. A method foruse on a Peer-to-Peer network, the method comprising enabling to querythe network about specific content recorded from a broadcast and topresent multiple identical ones among a plurality of query results as asingle item.
 11. The method of claim 10, wherein each different one ofthe plurality of query results is characterized by a respectiveidentifier comprised in recorded metadata.
 12. The method of claim 11,wherein the respective unique identifier comprises a respective CRID.13. The method of claim 10, comprising counting, for the single item,the multiple identical ones among the plurality of query results.